DEEP LEARNING IN IMAGE RECOGNITION: A COMPARATIVE REVIEW OF ARCHITECTURES AND MODELS
Keywords:
Deep Learning, Image Recognition, Convolutional Neural Networks (CNNs), Residual Networks (ResNets), Vision Transformers (ViTs), Transfer Learning, Performance Metrics, Data Augmentation, Regularization Techniques, Benchmark DatasetsAbstract
Deep learning has revolutionized image recognition, providing state-of-the-art performance across various applications, from medical diagnostics to autonomous vehicles. This comparative review explores the evolution of deep learning architectures and models used in image recognition. We categorize and analyze prominent architectures, including Convolutional Neural Networks (CNNs), Residual Networks (ResNets), Inception Networks, and more recent developments like Vision Transformers (ViTs). The review highlights key features, strengths, and limitations of each architecture while discussing their performance metrics in standard benchmark datasets such as ImageNet, CIFAR-10, and MNIST. Additionally, we examine the impact of transfer learning, data augmentation, and regularization techniques on model performance. By synthesizing current research, this review aims to provide insights into selecting appropriate architectures for specific image recognition tasks and identifies future research directions to enhance the capabilities of deep learning models in this domain.
Downloads
References
Alexey, K., & Vincent, Y. (2015). ImageNet classification with deep convolutional neural networks. Communications of the ACM, 60(6), 84-90. https://doi.org/10.1145/3065386
Chollet, F. (2017). Deep learning with Python. Manning Publications.
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778). https://doi.org/10.1109/CVPR.2016.90
Huang, G., Liu, Z., Van Der Maaten, L., & Weinberger, K. Q. (2017). Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2261-2269). https://doi.org/10.1109/CVPR.2017.243
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks. In Advances in neural information processing systems (pp. 1097-1105).
LeCun, Y., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278-2324. https://doi.org/10.1109/5.726791
Long, J., Shelhamer, E., & Darrell, T. (2015). Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3431-3440). https://doi.org/10.1109/CVPR.2015.7298965
Nair, V., & Hinton, G. E. (2010). Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th international conference on machine learning (Vol. 27, pp. 807-814).
Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 779-788). https://doi.org/10.1109/CVPR.2016.9
Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. In Proceedings of the International Conference on Learning Representations. https://arxiv.org/abs/1409.1556
Szegedy, C., Vanhoucke, V., Vinyals, O., & Google, A. (2015). Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1-9). https://doi.org/10.1109/CVPR.2015.7298594
Tan, M., & Le, Q. V. (2019). EfficientNet: Rethinking model scaling for convolutional neural networks. In Proceedings of the 36th International Conference on Machine Learning (Vol. 97, pp. 6105-6114). https://arxiv.org/abs/1905.11946
Vaswani, A., Shard, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Kaiser, Ł. (2017). Attention is all you need. In Advances in Neural Information Processing Systems (pp. 5998-6008). https://arxiv.org/abs/1706.03762
Weng, J., Cheng, Y., & Zhao, L. (2018). Deep learning for image classification: A comprehensive review. Journal of Computer Science and Technology, 33(4), 705-726. https://doi.org/10.1007/s11390-018-1824-2
Zhang, K., Zhang, Z., & Chen, Y. (2016). A survey on deep learning-based image recognition. Journal of Computer Science and Technology, 31(1), 85-108. https://doi.org/10.1007/s11390-016-1610-0
Zhang, Y., Song, L., & Wei, X. (2019). Transfer learning for image classification: A survey. IEEE Transactions on Neural Networks and Learning Systems, 30(5), 1357-1377. https://doi.org/10.1109/TNNLS.2018.2810981
Zhao, H., Shi, J., Qi, X., Wang, Z., & Jia, J. (2017). Pyramid scene parsing network. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 6230-6239). https://doi.org/10.1109/CVPR.2017.623
Zhou, K., Wang, H., & Zhao, X. (2019). A brief review of deep learning for image classification. Journal of Physics: Conference Series, 1396(1), 012023. https://doi.org/10.1088/1742-6596/1396/1/012023
Zhuang, F., et al. (2019). A comprehensive survey on transfer learning. Proceedings of the IEEE, 109(1), 43-76. https://doi.org/10.1109/JPROC.2020.2979930
Zhang, Y., & Xu, B. (2019). A comprehensive review on image recognition with deep learning. Neural Computing and Applications, 32(5), 1551-1563. https://doi.org/10.1007/s00500-018-3774-8
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2023 IEJRD

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.















